Relational Sequence Alignments and Logos
نویسندگان
چکیده
The need to measure sequence similarity arises in many applicitation domains and often coincides with sequence alignment: the more similar two sequences are, the better they can be aligned. Aligning sequences not only shows how similar sequences are, it also shows where there are differences and correspondences between the sequences. Traditionally, the alignment has been considered for sequences of flat symbols only. Many real world sequences such as natural language sentences and protein secondary structures, however, exhibit rich internal structures. This is akin to the problem of dealing with structured examples studied in the field of inductive logic programming (ILP). In this paper, we introduceReal, which is a powerful, yet simple approach to align sequence of structured symbols using well-established ILP distance measures within traditional alignment methods. Although straight-forward, experiments on protein data and Medline abstracts show that this approach works well in practice, that the resulting alignments can indeed provide more information than flat ones, and that they are meaningful to experts when represented graphically.
منابع مشابه
Visualizing bacterial tRNA identity determinants and antideterminants using function logos and inverse function logos
Sequence logos are stacked bar graphs that generalize the notion of consensus sequence. They employ entropy statistics very effectively to display variation in a structural alignment of sequences of a common function, while emphasizing its over-represented features. Yet sequence logos cannot display features that distinguish functional subclasses within a structurally related superfamily nor do...
متن کاملProfileGrids: a sequence alignment visualization paradigm that avoids the limitations of Sequence Logos
BACKGROUND The 2013 BioVis Contest provided an opportunity to evaluate different paradigms for visualizing protein multiple sequence alignments. Such data sets are becoming extremely large and thus taxing current visualization paradigms. Sequence Logos represent consensus sequences but have limitations for protein alignments. As an alternative, ProfileGrids are a new protein sequence alignment ...
متن کاملwebPRC: the Profile Comparer for alignment-based searching of public domain databases
Profile-profile methods are well suited to detect remote evolutionary relationships between protein families. Profile Comparer (PRC) is an existing stand-alone program for scoring and aligning hidden Markov models (HMMs), which are based on multiple sequence alignments. Since PRC compares profile HMMs instead of sequences, it can be used to find distant homologues. For this purpose, PRC is used...
متن کاملSpial: analysis of subtype-specific features in multiple sequence alignments of proteins
MOTIVATION Spial (Specificity in alignments) is a tool for the comparative analysis of two alignments of evolutionarily related sequences that differ in their function, such as two receptor subtypes. It highlights functionally important residues that are either specific to one of the two alignments or conserved across both alignments. It permits visualization of this information in three comple...
متن کامل13 Comparative RNA analysis
• R. Durbin, S. Eddy, A. Krogh und G. Mitchison, Biological sequence analysis, Cambridge, 1998 • D.W. Mount. Bioinformatics: Sequences and Genome analysis, 2001. • V. Bafna, S. Muthukrishnan, R. Ravi, Computing similarity between RNA strings. • D. Sankoff, Simultaneous solution of the RNA Folding , Alignment and Protosequence Problems, SIAM Journal of Appl. Math., 45,5,1985 • J. Gorodkin, L.J. ...
متن کامل